Solving the Polysemy Problem of Persian Words Using Mutual Information Statistics
نویسنده
چکیده
In recent years, large monolingual, comparable and parallel corpora have played a very crucial role in solving various problems of computational linguistics including machine translation, information retrieval, natural language processing, and the like. This paper tries to solve the problem of polysemy of Persian words while translating them into Persian by the computer. We use Mutual Information statistics obtained from a very large monolingual corpus of Persian. Mutual information values are calculated based on co-occurrence frequencies of words and used to measure the correlation between words. Using mutual information statistics the occurrence or co-occurrence frequencies of different equivalents of an ambiguous word in the target language is calculated and the most probable equivalent for every ambiguous word is selected. When mutual information value is high, the word associations are strong and provide dependable results for translational disambiguation and vice versa. The method discussed in this paper not only can be directly applied in the system of Persian-English machine translation, but also it can certainly increase performance effectiveness of the retrieval tasks, especially in cross-language information retrieval.
منابع مشابه
A Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملResearch of Blind Signals Separation with Genetic Algorithm and Particle Swarm Optimization Based on Mutual Information
Blind source separation technique separates mixed signals blindly without any information on the mixing system. In this paper, we have used two evolutionary algorithms, namely, genetic algorithm and particle swarm optimization for blind source separation. In these techniques a novel fitness function that is based on the mutual information and high order statistics is proposed. In order to evalu...
متن کاملResearch of Blind Signals Separation with Genetic Algorithm and Particle Swarm Optimization Based on Mutual Information
Blind source separation technique separates mixed signals blindly without any information on the mixing system. In this paper, we have used two evolutionary algorithms, namely, genetic algorithm and particle swarm optimization for blind source separation. In these techniques a novel fitness function that is based on the mutual information and high order statistics is proposed. In order to evalu...
متن کاملInformation Extraction Using Metadata andSolving Polysemy Problems
Data mining is the exploration and evaluation of large quantity of data to discover substantial, novel, useful and effectively understandable data. Hence determining the knowledge of a document becomes a necessary task in data mining. There are three approaches of metadata in general. They are stylistic, machine learning and knowledge bases. Sometimes the problem occurs when mining a document t...
متن کاملبررسی مشکلات جستوجو و بازیابی اطلاعات در پایگاههای اطلاعاتی از جنبه ویژگیهای نگارشی زبان فارسی
The present research was carried out with the aim of explicating the major writing and semantic problems of Persian language when using data environments and determining the degree of compatibility and attention to these features in Persian databases. This research is of survey analytical type being conducted through direct observation. Having reviewed the related literature, we kept a checkli...
متن کامل